55 research outputs found
Cure the headache of Transformers via Collinear Constrained Attention
As the rapid progression of practical applications based on Large Language
Models continues, the importance of extrapolating performance has grown
exponentially in the research domain. In our study, we identified an anomalous
behavior in Transformer models that had been previously overlooked, leading to
a chaos around closest tokens which carried the most important information.
We've coined this discovery the "headache of Transformers". To address this at
its core, we introduced a novel self-attention structure named Collinear
Constrained Attention (CoCA). This structure can be seamlessly integrated with
existing extrapolation, interpolation methods, and other optimization
strategies designed for traditional Transformer models. We have achieved
excellent extrapolating performance even for 16 times to 24 times of sequence
lengths during inference without any fine-tuning on our model. We have also
enhanced CoCA's computational and spatial efficiency to ensure its
practicality. We plan to open-source CoCA shortly. In the meantime, we've made
our code available in the appendix for reappearing experiments.Comment: 16 pages, 6 figure
Once is Enough: A Light-Weight Cross-Attention for Fast Sentence Pair Modeling
Transformer-based models have achieved great success on sentence pair
modeling tasks, such as answer selection and natural language inference (NLI).
These models generally perform cross-attention over input pairs, leading to
prohibitive computational costs. Recent studies propose dual-encoder and late
interaction architectures for faster computation. However, the balance between
the expressive of cross-attention and computation speedup still needs better
coordinated. To this end, this paper introduces a novel paradigm MixEncoder for
efficient sentence pair modeling. MixEncoder involves a light-weight
cross-attention mechanism. It conducts query encoding only once while modeling
the query-candidate interaction in parallel. Extensive experiments conducted on
four tasks demonstrate that our MixEncoder can speed up sentence pairing by
over 113x while achieving comparable performance as the more expensive
cross-attention models.Comment: Accepted to EMNLP 202
Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit
Adapting Deep Learning (DL) techniques to automate non-trivial coding
activities, such as code documentation and defect detection, has been
intensively studied recently. Learning to predict code changes is one of the
popular and essential investigations. Prior studies have shown that DL
techniques such as Neural Machine Translation (NMT) can benefit meaningful code
changes, including bug fixing and code refactoring. However, NMT models may
encounter bottleneck when modeling long sequences, thus are limited in
accurately predicting code changes. In this work, we design a Transformer-based
approach, considering that Transformer has proven effective in capturing
long-term dependencies. Specifically, we propose a novel model named DTrans.
For better incorporating the local structure of code, i.e., statement-level
information in this paper, DTrans is designed with dynamically relative
position encoding in the multi-head attention of Transformer. Experiments on
benchmark datasets demonstrate that DTrans can more accurately generate patches
than the state-of-the-art methods, increasing the performance by at least
5.45\%-46.57\% in terms of the exact match metric on different datasets.
Moreover, DTrans can locate the lines to change with 1.75\%-24.21\% higher
accuracy than the existing methods
Precursors and Pathways Leading to Enhanced Secondary Organic Aerosol Formation during Severe Haze Episodes
Publisher Copyright: © 2021 American Chemical SocietyMolecular analyses help to investigate the key precursors and chemical processes of secondary organic aerosol (SOA) formation. We obtained the sources and molecular compositions of organic aerosol in PM2.5in winter in Beijing by online and offline mass spectrometer measurements. Photochemical and aqueous processing were both involved in producing SOA during the haze events. Aromatics, isoprene, long-chain alkanes or alkenes, and carbonyls such as glyoxal and methylglyoxal were all important precursors. The enhanced SOA formation during the severe haze event was predominantly contributed by aqueous processing that was promoted by elevated amounts of aerosol water for which multifunctional organic nitrates contributed the most followed by organic compounds having four oxygen atoms in their formulae. The latter included dicarboxylic acids and various oxidation products from isoprene and aromatics as well as products or oligomers from methylglyoxal aqueous uptake. Nitrated phenols, organosulfates, and methanesulfonic acid were also important SOA products but their contributions to the elevated SOA mass during the severe haze event were minor. Our results highlight the importance of reducing nitrogen oxides and nitrate for future SOA control. Additionally, the formation of highly oxygenated long-chain molecules with a low degree of unsaturation in polluted urban environments requires further research.Peer reviewe
Dense Feature Aggregation and Pruning for RGBT Tracking
How to perform effective information fusion of different modalities is a core
factor in boosting the performance of RGBT tracking. This paper presents a
novel deep fusion algorithm based on the representations from an end-to-end
trained convolutional neural network. To deploy the complementarity of features
of all layers, we propose a recursive strategy to densely aggregate these
features that yield robust representations of target objects in each modality.
In different modalities, we propose to prune the densely aggregated features of
all modalities in a collaborative way. In a specific, we employ the operations
of global average pooling and weighted random selection to perform channel
scoring and selection, which could remove redundant and noisy features to
achieve more robust feature representation. Experimental results on two RGBT
tracking benchmark datasets suggest that our tracker achieves clear
state-of-the-art against other RGB and RGBT tracking methods.Comment: arXiv admin note: text overlap with arXiv:1811.0985
Life-Cycle-Based Multicriteria Sustainability Evaluation of Industrial Parks: A Case Study in China
Along with increasing concerns on environmental protection and global warming mitigation, new industrial organization modes such as “Ecoindustrial Park” and “Low Carbon Industrial Park” are emerging. Since ecoindustrial parks and low carbon industrial parks may offer multifaceted benefits to the users, it naturally follows that the sustainability assessment of the industrial parks ought to adopt a multicriteria methodology. In this paper, a multicriteria sustainable evaluation framework is proposed in combination with the life cycle analysis and applied to a low carbon and high end industrial park (LCHE) in Beijing, China. Results show that the LCHE industrial park can contribute to both energy-saving and greenhouse gas emission mitigations compared with other industrial parks. In terms of economic performance, although the economic profits are considerable, the investment per constructed area is relatively high. The results of sustainable analysis of the LCHE industrial park can thus shed light on future upgrading of industrial parks
SYNONYMOUS CONDON USAGE BIAS AND OVEREXPRESSION OF A SYNTHETIC xynB GENE FROM Aspergillus niger NL-1 IN Pichia pastoris
To further improve the expression level of recombinant xylanase in Pichia pastoris, the xynB gene, encoding the mature peptide from Aspergillus niger NL-1, was designed and synthesized based on the synonymous condon bias of P. pastoris and optimized G+C content. 155 nucleotides were changed, and the GC content decreased from 57.7% to 43.6%. The synthetic xynB was inserted into the pPICZaA and then integrated into P. pastoris GS115. The activity of the recombinant xylanase reached 1414.7 U/mL, induced with 0.8% methanol after 14-day cultivation at a temperature of 28oC in shake flasks, which was 267% higher than that of the native gene. Furthermore, the maximum xylanase activity of 20424.2 U/mL was obtained by high-density fermentation in a 5-L fermenter, which was the highest xylanase expression in P. pastoris yet reported. The recombinant xylanase had its optimal activity at a pH of 5.0 and temperature of 50oC. The recombinant xylanase was stable over a pH range of 4.5 to 8.0. Thus, this report provides an industrial means to produce the recombinant xylanase in P. pastoris
Study on the Relationship between Early Shrinkage Cracking and Mechanical Properties of Nano-Clay Cement Mortar Based on Fractal Theory
In order to study the influence of nano-clay on the crack resistance of cement-based materials, two kinds of nano-metakaolin (NMK) and two kinds of nano-attapulgite clay (NMA) were considered. The early cracking process and mechanical properties of nano-clay cement mortar (NCM) was studied by using a plate knife-edge constraint test. Based on fractal theory, the distribution characteristics of NCM surface cracks were revealed, and the calculation method forNCM maximum crack width was given. The results show that the cracking time of the NMK-3 specimen is 2 and 6 h later than that of NMK-1 and NMA-2, respectively; the smaller the particle size of nano-clay, the earlier the cracking time of the specimen. However, nano-clay effectively inhibited the expansion of mortar cracks, and the cracks on the surface of NCM were thin and sparse. At 28 days, the maximum crack width of NMK-3 was 46.7% and 33.3% lower than that of NMK-1 and NMA-2, respectively. NMK hadthe best improvement effect on the mechanical properties cement mortar. The smaller the particle size, the more pronounced the improvement effect.The flexural strength ratio and compressive strength ratio at 7 and 28 days are 76.7%, 67.4%, and 61.2%, respectively.The distribution of surface cracks on NCM has fractal characteristics, and the fractal dimension of surface cracks is smaller than that of ordinary cement mortar. The larger the particle size of nano-clay, the smaller the fractal dimension of cracks. The quantitative relationship between fracture fractal dimension and NCM elastic modulus and shrinkage tensile stress is established
- …